Unsupervised Text Syllabification with Information from Audio

نویسندگان

  • Mircea Giurgiu
  • Adriana Stan
چکیده

The objectives of this internship were: studies on supervised and unsupervised text syllabification techniques, with a focus on unsupervised methods, implementation and evaluation of several unsupervised audio syllabification algorithms, implementation and evaluation of an unsupervised text syllabification algorithms, improve the performance of text syllabification with features from acoustics. In this report I shortly present my work as well as the results obtained and the conclusions I have on the topic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Morphological Segmentation Can Improve Syllabification

Syllabification is sometimes influenced by morphological boundaries. We show that incorporating morphological information can improve the accuracy of orthographic syllabification in English and German. Surprisingly, unsupervised segmenters, such as Morfessor, can be more useful for this purpose than the supervised ones.

متن کامل

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many lowresourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned. In the first stage, each word-level audio segment in the u...

متن کامل

Multilingual syllabification using weighted finite-state transducers

This paper describes an approach to syllabification that has been incorporated into the English and German text-to-speech systems at Bell Labs. Implemented as a weighted finite-state transducer, the syllabifier is easily integrated – via mathematical composition – into the finite-state based text analysis component of the textto-speech system. The weights are based on frequencies of onset, nucl...

متن کامل

An Automated Video Classification and Annotation Using Embedded Audio for Content Based Retrieval

Efficient and effective video classification and annotation demands automated unsupervised classification and annotation of videos based on its embedded video content as manual indexing is unfeasible. Audio is a rich source of information in the digital videos that can provide useful descriptor for indexing the video databases. Audio archives contrast with image or video archives in a number of...

متن کامل

Using Syllables As Indexing Terms in Full-Text Information Retrieval

This paper describes empirical results of information retrieval in 13 languages of the Cross Language Evaluation Forum (CLEF) collection augmented with results of Turkish using syllables as a means to manage morphological variation in the languages. This kind of approach has been used in speech retrieval (e.g. Larson and Eickeler 2003), but for some reason it has not been much tried out in text...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014